HW 3: Drafting Viz

Author

Carmen Hoyt

Published

February 19, 2025

Description

  1. Which option do you plan to pursue?

I plan to pursue Option 1.

  1. Restate your questions:

What is the makeup of emissions in the US?

    1. What are the major sources of pollution?
    1. What sectors do these belong to?
    1. Which pollutants are most prominent?
    1. How do emissions differ by state?
  1. Explain which variables from your data set(s) you will use to answer your question(s), and how.

To answer my first two questions, I need to group the cleaned dataset by source_description and sum emissions_tons to achieve total emissions by source. Then, I can investigate the different sectors contributing to air pollution by grouping the dataset by eis_sector and summing emissions_tons. However, some of these sectors are broken down even further. I am only interested in the overarching sector; so, to consolidate, I will need to assign the overarching sector with a new variable sector. Then, I can group by sector and sum emissions_tons again for a better idea of emissions by sector.

To answer my third question, I can group the dataset by pollutant_type and sum emissions_tons. This will give me the breakdown of emissions by GHG, HAP, CAP and CAP/HAP. OR I can group the dataset by pollutant and sum emissions_tons. This is what I have created in my mockup, but I will have to brainstorm how to represent the very small pollutants. Perhaps I only visualize the top 5 or 10.

Finally, I will sum emissions_tons by state. However, I think this might be more meaningful if I also normalize by state area. If i join my data with the state.area data in R, I can divide total emissions by area (in square miles) and end up with emissions (tons) per square mile for each state.

  1. Borrowed visualizations:
  • Where Do Emissions Come From? I really like how the overarching categories (“Energy”, “Industrial processes”, “Agriculture, etc.” and “other) are further broken down and represented with a hue in the circular bar chart. I would like to borrow this framework to visualize the breakdown of eis_sector for my dataset (if I keep the subcategories).

  • Pollutant Infographic I like how the size of the clouds represents the pollutant_type variables. I was thinking of representing “which pollutant types are most prominent” with clouds, so it would be very similar to this but by total emissions (in tons) instead of percentages.

  1. Hand drawn visualizations

Hand drawn mock-up for emissions infographic.

  1. Mock-up
Expand-code
# load packages
library(tidyverse)
library(here)
library(ggwordcloud)
library(geofacet) 
library(paletteer)
library(showtext)
library(patchwork)

# load data
emissions_cleaned <- read_csv(here("data", "emissions_cleaned.csv"))

# get colors
my_colors <- paletteer::paletteer_d("palettetown::gloom")

showtext_auto()

# import google fonts
font_add_google(name = "DM Serif Display", family = "dm_serif") # titles
font_add_google(name = "IBM Plex Sans Condensed", family = "ibm_condensed")
titletext <- "ibm_condensed"
font_add_google(name = "DM Sans", family = "dm_sans") # text
font_add_google(name = "IBM Plex Sans", family = "ibm_plex")
subtext <- "ibm_plex"
Sources
# emissions by source (bar chart)
sources <- emissions_cleaned %>%
  group_by(source_description) %>%
  summarise(emissions_tons = sum(emissions_tons))

# bar chart
stack <- ggplot(sources, aes(source_description, emissions_tons)) +
  
  # choose "smokestack" esc color
  geom_col(fill = "#D0D0D0FF") +
  
  # label bars with emissions
  geom_text(aes(label = scales::label_comma(accuracy = 1, suffix = " t")(emissions_tons)),
            vjust = - 0.5, 
            size = 6,
            family = subtext,
            fontface = "bold") +
  
  # set base theme
  theme_void() +
  
  # adjust theme
  theme(
    # add x axis text back
    axis.text.x = element_text(family = subtext,
                               face = "bold",
                               size = 17,
                               
                               # move closer to bars
                               margin = margin(t = -10,
                                               b = 10)),
    
    # extend plot margin at top
    plot.margin = margin(t = 20)
  )

stack

A bar graph showing 3 emissions sources (non-point, nonroad, and onroad) showing that onraod procudes the most emissions by tons.

Sector
# emissions by sector
sector <- emissions_cleaned %>%
  group_by(eis_sector) %>%
  summarise(emissions_tons = sum(emissions_tons)) %>%
  # combine subsectors
  mutate(sector = case_when(
    str_detect(eis_sector, "Agriculture") ~ "Agriculture",
    str_detect(eis_sector, "Biogenics") ~ "Biogenics",
    str_detect(eis_sector, "Bulk Gasoline") ~ "Bulk Gasoline Terminals",
    str_detect(eis_sector, "Commercial Cooking") ~ "Commercial Cooking",
    str_detect(eis_sector, "Dust") ~ "Dust",
    str_detect(eis_sector, "Fires") ~ "Fires",
    str_detect(eis_sector, "Fuel Comb") ~ "Fuel Comb",
    str_detect(eis_sector, "Gas Stations") ~ "Gas Stations",
    str_detect(eis_sector, "Industrial Processes") ~ "Industrial Processes",
    str_detect(eis_sector, "Miscellaneous") ~ "Misc",
    str_detect(eis_sector, "Mobile") ~ "Mobile",
    str_detect(eis_sector, "Solvent") ~ "Solvent",
    str_detect(eis_sector, "Waste Disposal") ~ "Waste Disposal"
  )) %>%
  group_by(sector) %>%
  summarise(emissions_tons = sum(emissions_tons)) %>%
  mutate(label = paste0(sector, " (", round((emissions_tons/sum(emissions_tons))*100, 0), " %)"))

# donut chart
ggplot(sector, aes(x = 2, y = emissions_tons, fill = label)) +
  geom_bar(stat = "identity", width = 1) +
  
  # use polar coordinates
  coord_polar(theta = "y", start = 0) + 
  
  # set base theme
  theme_void() +  
  
  # create hole
  xlim(0.5, 2.5) +
  
  # set legend
  theme(
        legend.position = "right",
        
        legend.title = element_text(family = subtext,
                                    size = 15,
                                    face = "bold"),
        
        legend.text = element_text(family = subtext,
                                   size = 10)
        
        ) + 
  labs(fill = "Sector") +
  
  # use gloom palette
  scale_fill_paletteer_d("palettetown::gloom")

A donut chart showing the breakdown of emissions in tons by sector, with the mobile sector comprising the majority.

Sector
# remove mobile sector for further analysis
sector %>%
  filter(!sector == "Mobile") %>%
  
  # generate new donut chart
  ggplot(aes(x = 2, y = emissions_tons, fill = label)) +
  geom_bar(stat = "identity", width = 1) +
  
  # use polar coordinates
  coord_polar(theta = "y", start = 0) + 
  
  # set base theme
  theme_void() +  
  
  # create hole
  xlim(0.5, 2.5) + 
  
  # set legend
  theme(
         legend.position = "right",
        
        legend.title = element_text(family = subtext,
                                    size = 15,
                                    face = "bold"),
        
        legend.text = element_text(family = subtext,
                                   size = 10)
        
        ) + 
  labs(fill = "Sector") +
  
  # use gloom palette
  scale_fill_paletteer_d("palettetown::gloom")

A donut chart showing the breakdown of emissions in tons by sector, with the mobile sector comprising the majority.

Sector
# remove mobile and fires
sector %>%
  filter(!sector %in% c("Mobile", "Fires")) %>%
  
  # generate new donut chart
  ggplot(aes(x = 2, y = emissions_tons, fill = label)) +
  geom_bar(stat = "identity", width = 1) +
  
  # use polar coordinates
  coord_polar(theta = "y", start = 0) + 
  
  # set base theme
  theme_void() + 
  
  # create hole
  xlim(0.5, 2.5) +
  
  # set legend
  theme(
    
        legend.position = "right",
        
        legend.title = element_text(family = subtext,
                                    size = 15,
                                    face = "bold"),
        
        legend.text = element_text(family = subtext,
                                   size = 10)
  ) + 
  
  labs(fill = "Sector") +
  
  # use gloom palette
  scale_fill_paletteer_d("palettetown::gloom")

A donut chart showing the breakdown of emissions in tons by sector, with the mobile sector comprising the majority.

Code
# emissions by sector
sector_2 <- emissions_cleaned %>%
  #filter
  group_by(eis_sector, scc_level_2) %>%
  summarise(emissions_tons = sum(emissions_tons)) %>%
  # combine subsectors
  mutate(sector = case_when(
    str_detect(eis_sector, "Agriculture") ~ "Agriculture",
    str_detect(eis_sector, "Biogenics") ~ "Biogenics",
    str_detect(eis_sector, "Bulk Gasoline") ~ "Bulk Gasoline Terminals",
    str_detect(eis_sector, "Commercial Cooking") ~ "Commercial Cooking",
    str_detect(eis_sector, "Dust") ~ "Dust",
    str_detect(eis_sector, "Fires") ~ "Fires",
    str_detect(eis_sector, "Fuel Comb") ~ "Fuel Comb",
    str_detect(eis_sector, "Gas Stations") ~ "Gas Stations",
    str_detect(eis_sector, "Industrial Processes") ~ "Industrial Processes",
    str_detect(eis_sector, "Miscellaneous") ~ "Misc",
    str_detect(eis_sector, "Mobile") ~ "Mobile",
    str_detect(eis_sector, "Solvent") ~ "Solvent",
    str_detect(eis_sector, "Waste Disposal") ~ "Waste Disposal"
  )) %>%
  filter(sector == "Mobile") %>%
  group_by(scc_level_2) %>%
  summarise(emissions_tons = sum(emissions_tons)) %>%
  mutate(label = paste0(scc_level_2, " (", round((emissions_tons/sum(emissions_tons))*100, 0), " %)"))

# expand mobile sector for further analysis
  # generate new donut chart
 ggplot(sector_2, aes(x = 2, y = emissions_tons, fill = label)) +
  geom_bar(stat = "identity", width = 1) +
  
  # use polar coordinates
  coord_polar(theta = "y", start = 0) + 
  
  # set base theme
  theme_void() +  
  
  # create hole
  xlim(0.5, 2.5) + 
  
  # set legend
  theme(
         legend.position = "right",
        
        legend.title = element_text(family = subtext,
                                    size = 15,
                                    face = "bold"),
        
        legend.text = element_text(family = subtext,
                                   size = 10)
        
        ) + 
  labs(fill = "Sector") +
  
  # use gloom palette
  scale_fill_paletteer_d("palettetown::gloom")

I’d like to represent the sector categories by hue, as suggested in my second “borrowed viz”. This will take some workshopping.

Pollutant Cloud
# emissions by pollutant
pollutant <- emissions_cleaned %>%
  group_by(pollutant) %>%
  summarise(emissions_tons = sum(emissions_tons)) %>%
  arrange(desc(emissions_tons)) %>%
  slice(1:10)

# emissions by pollutant type
pollutant_type <- emissions_cleaned %>%
  group_by(pollutant_type) %>%
  summarise(emissions_tons = sum(emissions_tons))

# cloud plot
cloud <- ggplot(pollutant, aes(label = pollutant, size = emissions_tons)) +
  geom_text_wordcloud(family = titletext) +
  #scale_size_area(max_size = 20) +
  theme_minimal()

cloud

A word cloud graph showing the emissions tons by pollutant, with carbon dioxide comprising the majority.

Map
# Convert built-in state.area to a df
state_data <- data.frame(
  state = state.name,
  abbrv = state.abb,
  area = state.area
)

# Merge datasets using left_join
state <- emissions_cleaned %>%
  left_join(state_data, by = "state") %>%
  group_by(state, abbrv, area) %>%
  summarise(total_emissions = sum(emissions_tons)) %>%
  mutate(rel_emissions = total_emissions/area) %>%
  arrange(desc(rel_emissions)) %>%
  mutate(opacity = rel_emissions/5068.261374)

core = "#F87000FF"
accent = "gray20"

ggplot(state) +

  # initiate a plot with a rectangles, shading by relative observations (opacity value) ----
  geom_rect(aes(xmin = 0, xmax = 1, ymin = 0, ymax = 1, alpha = opacity), 
            fill = core) +
  
  # label with state abbreviation ----
  geom_text(aes(x = 0.5, y = 0.7, label = abbrv), 
            size = 8, 
            family = subtext,
            color = "black") +
  
  # label with observations ----
  geom_text(aes(x = 0.5, y = 0.3, label = round(rel_emissions, 0)), 
            size = 5, 
            family = subtext,
            color = "black")  +

  # break rectangle up by state ----
  geofacet::facet_geo(~state) +

  # make each rectangle the same size ----
  coord_fixed(ratio = 1) +
  
  # add descriptio line as subtitle ----
  labs(title = "Emissions by State",
       subtitle = "Tons per Square Mile",
       caption = "Data Source: EPA National Emissions Inventory 2020") +
  
  # apply a completely empty theme ----
  theme_void() +
  
  # further customize theme ----
  theme(
    
    # remove headers from faceted plots ----
    strip.text = element_blank(),
    
    # adjust the font and color of the title ----
    plot.title = element_text(family = titletext,
                              face = "bold",
                              size = 30, 
                              hjust = 0.5,  
                              margin = margin(t = 10,
                                              b = 10)),
    
    # adjust the font and color of the title ----
    plot.subtitle = element_text(family = subtext,
                                 size = 20, 
                                 hjust = 0.5,  
                                 margin = margin(b = 10)),
    
    # remove legend ----
    legend.position = "none",
    
    plot.margin = margin(b = 10)
  )

A map showing emissios in tons per square mile for each state in the US, with New Jersey producing the most.

Extra code
# emissions by sector and pollutant type
sector_breakdown <- emissions_cleaned %>%
  group_by(eis_sector, pollutant_type) %>%
  summarise(emissions_tons = sum(emissions_tons)) %>%
  mutate(sector = case_when(
    str_detect(eis_sector, "Agriculture") ~ "Agriculture",
    str_detect(eis_sector, "Biogenics") ~ "Biogenics",
    str_detect(eis_sector, "Bulk Gasoline") ~ "Bulk Gasoline Terminals",
    str_detect(eis_sector, "Commercial Cooking") ~ "Commercial Cooking",
    str_detect(eis_sector, "Dust") ~ "Dust",
    str_detect(eis_sector, "Fires") ~ "Fires",
    str_detect(eis_sector, "Fuel Comb") ~ "Fuel Comb",
    str_detect(eis_sector, "Gas Stations") ~ "Gas Stations",
    str_detect(eis_sector, "Industrial Processes") ~ "Industrial Processes",
    str_detect(eis_sector, "Miscellaneous") ~ "Misc",
    str_detect(eis_sector, "Mobile") ~ "Mobile",
    str_detect(eis_sector, "Solvent") ~ "Solvent",
    str_detect(eis_sector, "Waste Disposal") ~ "Waste Disposal"
  )) %>%
  group_by(sector, pollutant_type) %>%
  summarise(emissions_tons = sum(emissions_tons))
  1. Questions:
  1. I ran into a few challenges with my visualizations. Ultimately, I decided that including subgroups in the donut charts would be too overwhelming. Perhaps I can use the donut chart that breaks the “Mobile” sector down by scc_level_2 instead. Additionally, instead of including clouds sized by total emissions, I decided to make a word cloud of pollutants, representing emissions in tons by text size. Something else I was thinking about is normalizing emissions by population size instead of state area; I am not sure if there is a preferred method.

  2. I have been using ggwordcloud which was not covered in class but was very easy to learn independently. Everything else I have adapted from class!

  3. I would welcome any feedback on the read-ability of my plots (specifically on the text size, placement, and fonts). I would also be interested to learn if my color palette is working and clear to the audience. I chose it for the orange-blue contrast and hoped that it would be color-blind friendly.